AD-A102  216  MISSOURI  UNIV-COLUMBXA  DEPT  OF  STATISTICS 
ESTIMATION  IN  LATENT  TRAIT  MODELS. (U) 

MAY  81  S  E  RI600N*  R  K  TSUTAKAWA 
UNCLASSIFIED  TR-102 


A DA102216 


Steven  E.  Rigdon 
and 

Robert  K.  Tsutakawa 


/ 


Research  Report  81-1 

Mathematical  Sciences  Technical  Report  No.  102 

May  1981 


Department  of  Statistics 
University  of  Missouri 
Columbia,  MO  65211 


Tu 


Prepared  under  contract  No.  N00014-77-C-0097,  NR1 50-395 


with  the  Personnel  and  Training  Research  Programs 
Psychological  Sciences  Division 
Office  of  Naval  Research 


Approved  for  public  release;  distribution  unlimited. 
Reproduction  in  whole  or  in  part  is  permitted  for 
any  purpose  of  the  United  State  Government 


s 


DT1C 

ELECTE 
JUL  30  1981 


D 


si  .7  a  a 


m 


Unclassified _ _ 

SECURITY  CLASSIFICATION  of  This  PACC  D«f*  Entered) 


/ZU,  i  -L 


REPORT  DOCUMENTATION  PAGE 

- -  READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

1.rREPORT  NUMBER  j/  *'  GOV! 

Research  /ep«t  Ip  1-1  / 

ACCESSION  NO. 

3.  RECIPIENT'S  CATALOG  NUMBER 

4A _ 

^7aut.cmfLXutttuiM^ - *r. 

ESTIMATION  IN  LATENT  TRAIT  MODELS 
<■  ^ 


|7.  AUTMORf#; 


I.  TYPE  OF  REPORT  A  PERIOD  COVERED 


PERFORMING  ORG.  REPORT  NUMBER 

sj^c (5^ TR//p.*L  OA 3 E Rf •  J 


-Steven  E./Rigdon  $  Robert  K./Tsutakawa  ;Np/0O14-77-C-OG^7 


9.  PERFORMING  ORGANIZATION  NAME  ANO  ADDRESS 

Department  of  Statistics 
University  of  Missouri 
Columbia,  Missouri  65211 


10.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  A  WORK  UNIT  NUMBERS 

/“]  PE:  fir1  -MrT  NR  150- 


/  /  I  £  ^  l  nwn 

lO,'  T'LUi  '/  PR042r  04 
_ _  TA:  70  4  2- 04^111 

*  U — AWW 

I  I  I  May  AP81  [ 


M.  CONTROLLING  OFFICE  NAME  AND  ADORESS  f  »  U HBW 

Personnel  and  Training  Research  (/  /  i  May  A981  _ 

Office  of  Naval  Research  (Code  458)  ’*•  number  of -rages 

Arlington,  VA  22217  _ 39 _ 

U.  MONITORING  AGENCY  NAME  i  ADORESSf//  dllltfnt  from  Controlling  oiiicT)  IS.  SECURITY  CLASS.  f«<  Oil*  report; 

Unclassified 


HjPh 


1  J 
f  »  ‘ 


tS«.  DECLASSIFICATION/ DOWN  GRADING 
SCHEDULE 


16.  DISTRIBUTION  STATEMENT  (of  r/af*  Report; 

Approved  for  public  release;  distribution  unlimited.  Reproduction 
in  whole  or  in  part  is  permitted  for  any  purpose  of  the  United 
States  Government . 


17.  DISTRIBUTION  STATEMENT  (of  the  abetrmet  entered  In  Block  30,  If  different  from  Report) 


W  '7  a  -  j  <!>* .  ,k~  ?J-i 


IB.  SUPPLEMENTARY  NOTES 


19.  KEY  WORDS  ( Continue  on  rereree  aide  If  necee eery  end  Identify  by  block  number) 

Latent  trait  model,  EM  algorithm,  prior  distribution,  posterior 
distribution,  Rasch  model. 


.  ABST^^CT  (Continue  on  reweree  e/de  tf  neceeeery  end  identify  by  AJocfc  number) 

Estimation  of  ability  and  item  parameters  in  latent  trait 
models  is  discussed.  When  both  ability  and  item  parameters  are 
considered  fixed  but  unknown,  the  method  of  maximum  likelihood 
for  the  logistic  or  probit  models  is  well  known.  This  paper 
discusses  techniques  for  estimating  ability  and  item  parameters 
when  the  ability  parameters,  or  item  parameters  (or  both)  are 
considered  random.  When  the  item  parameters  are  considered  fixed/ 


DD 


COITION  OF  I  NOV  69  IS  OBSOLETE 
S/N  0  103-0I4-  6601  | 


CLASSIFICATION  OF  Twit  PAGE  Date  Bn  fared) 


20.  Continued. 


and  the  ability  parameters  are  random,  from  some  prior  dis¬ 
tribution  with  fixed  but  unknown  parameters,  the  EM  algorithm 
is  applied.  A  modification  of  the  EM  algorithm,  which  requires 
considerably  less  computation,  is  proposed.  When  both  ability 
and  item  parameters  are  considered  random,  the  EM  algorithm 
seems  to  be  impractical  because  the  amount  of  computation 
needed  is  very  large.  In  this  case  another  modification  to 
the  EM  algorithm  is  proposed^One  advantage  to  using  prior 
distributions  is  that  parameter  estimates  usually  exist  in 
situatiosn  where  the  maximum  likelihood  estimates  do  not. 

These  methods  are  applied  to  the  one  parameter  logistic  or 
Rasch  model  and  numerically  compared  using  several  sets  of 
simulated  data.  It  appears  very  likely  that  most  of  the 
methods  discussed  here  can  be  readily  extended  to  the  two  and 
three  parameter  logistic  or  probit  model. 
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Estimation  in  Latent  Trait  Models 


1.  INTRODUCTION 


Given  that  we  have  n  subjects  and  k  test  items,  consider 

binary  responses  Y^  ,  i  =  1,...,  n  ;  j  =  1,...,  k  ,  where  .  =  0 

th 

or  1  depending  on  whether  the  i  subjects's  response  to  item  j 
is  incorrect  or  correct.  Let 


>.  .  =  1  -  q.  .  =  P  (Y .  .  =  lj 
il  il  il 


■V 


(1.1) 


be  a  model  for  responses,  where  0^  Is  the  ability  (or  latent 

t  h 

trait)  parameter  of  the  i  subject  and  3.  (possibly  vector  valued) 

t  h 

is  the  item  parameter  of  the  j1"  item.  Given  0  =  (9,,...,  0  )  and 

-  1  n 

3  =  (3^,...,  3,)  we  assume  conditional  independence  among  responses, 
Y  =  ( (Y  ) ) ,  so  that 

n  k  y .  .  1-y .  . 

P  (Y  =  y  1  e ,  3 )  =  n  n  p  J3  q  J-3  (1.2) 

~  ~  ~  i=l  j=l  13 

We  wish  to  consider  estimates  of  0  and  8  together  with 
measures  of  uncertainties  in  these  estimates.  For  this  purpose  we 
introduce  additional  structures  to  the  model,  depending  on  whether 
we  treat  0  or  3  (or  possibly  both)  as  fixed  parameters  or  random 
with  an  unknown  prior  distribution.  In  the  terminology  commonly 
used  in  linear  models  analysis ,  we  may  classify  the  various  models 
as  shown  in  the  following  table . 


Fixed 

Random 

Fixed 

Fixed  Effects  | 

Mixed  Effects 

B 

Models 

| 

Models 

Random 

Mixed  Effects 

Random  Effects 

Models 

1 

_ 1 

Models 

Most  of  the  currently  available  techniques  are  for  the  fixed 
effects  models,  where  the  use  of  maximum  likelihood  for  the  logisti 
and  probit  models  is  well  known  (Wright  and  Panchapakesan  1969  and 
Wainer  et  al.  1980) . 

In  dealing  with  random  parameters,  we  shall  assume  that  their 
distributions  belong  to  certain  exponential  families  with  unknown 
parameters.  In  particular  we  let  $ ^  and  $  denote  the  para¬ 
meters  of  the  prior  distribution  for  0  and  3  respectively, 
where  ^  or  (or  both)  may  be  vector  valued.  When  6  and 

3  are  both  random,  we  will  further  assume  that  they  may  be 
treated  as  independent  random  samples. 

2.  ESTIMATION  VIA  THE  EM  ALGORITHM 

One  general  approach  to  estimating  0  and  3  for  the  random 
effects  and  mixed  effects  models  is  the  EM  algorithm  (Dempster, 
Laird  and  Rubin  1977) .  The  difficulty  in  using  the  EM  algorithm 
in  practice  depends  very  much  on  the  model.  The  difficulties  are 
primarily  due  to  the  fact  that  the  joint  distribution  of  (Y,6,B) 
does  not  belong  to  an  exponential  family.  We  will  discuss  some  of 
the  difficulties  and  propose  modifications  which  can  be  used  to 


obtain  estimates  for  the  different  models. 


One  way  to  view  the  EM  algorithm  is  to  consider  certain 
parameters  as  nuisance  parameters  and  integrate  them  out  so  that 
we  are  left  with  a  likelihood  function  of  the  parameters  of  inter¬ 


est,  which  we  can  then  try  to  maximize.  The  maximization  is 
carried  out  interatively ,  by  successively  maximizing  a  function 
of  certain  unobserved  sufficient  statistics  which  are  estimated 
by  their  conditional  expectations  given  preliminary  estimates 
of  the  unknown  parameter. 

2.1  EM  Algorithm  Applied  to  Mixed  Models  (MLF) 

Suppose  we  are  given  k  items  with  parameters  3  =  (3^,...,  B^ 
which  we  consider  fixed,  and  a  random  sample  of  subjects  with 
abilities  0  =  (0^,...,  0^),  selected  from  a  prior  distribution  with 
parameter  <J>^.  In  this  case,  (3,4>^)  may  be  considered  the  para¬ 
meters  to  be  estimated  by  the  EM  algorithm  and  0  an  unobserved 
random  variable  with  sufficient  statistic  T^. 

Starting  with  some  initial  estimate  (3^,4>j^)  for 
the  algorithm  repeats  the  following  E  and  M  steps  for  v  =  0, 

1,  ...  until  a  convergence  criterion  is  met. 

E  Step:  Given  ( 3 ^ V ^ ,<f> ^ ) ,  compute  the  posterior  expectation  of  T 


(v+1) 


(v)  ^(v) 

;  1 1  ~  t  ’  Z  1 


=  E(T.  |Y,B  ,4>r') 


M  Step:  Compute  the  value  of  (3  +  )  which  maximizes 


e  ( log  f  (Y,e|  b  (v+1),e{v+1) )  |  y,b  (v)  ,0-[v) ) 


where  f  (Y ,  q  |  3  +  ,4,  (V  +  D  >  is  the  jQint  probability  density  func¬ 


tion  of  (Y,g)  given  (3 


(v  +  1)  a  (v+1) 

•ll 


)  . 


The  MLF  procedure  is  based  on  the  same  principle  as  the  MLF 
procedure  for  linear  mixed  models  with  normally  distributed  ran¬ 
dom  variables  discussed  by  Dempster,  Rubin  and  Tsutakawa  (1981). 

One  modification  of  the  MLF  procedure  is  replacing  the  above 
M  Step  by  the  following 

M  Step:  Compute  the  maximum  likelihood  estimate  of  (8»4>^)  using 
t^v  +  l)  -|^eu  Qf  f  with  0  fixed  at  its  posterior  expecta¬ 
tion  given  ( 8  ^  ,  <t>|V  ^  )  . 

Because  this  procedure  conditions  on  the  posterior  expectation  of 
0  given  (8  ^  )  each  time  through  the  iteration,  we  denote 

this  procedure  by  CMLF. 

We  note  that  Sanathanan  and  Blumenthal  (1978)  use  the  EM 
algorithm  to  obtain  estimates  of  the  item  and  ability  parameters  for 
mixed  effects  situations.  However,  their  procedure  is  somewhat 
different  and  is  based  on  first  obtaining  conditional  maximum 
likelihood  (CML)  estimates  for  conditional  on  the  observed 

frequency  distribution  of  raw  scores,  and  then  applying  the  EM 

algorithm  to  estimate  0  while  keeping  (8, 4>^)  fixed.  It  appears 

unlikely  that  this  method  generalizes  to  more  complex  models,  since 
such  conditional  maximum  likelihood  estimates  exist  because  of 
special  properties  of  the  Rasch  model. 

2.2  EM  Algorithm  Applied  to  Random  Effects  Models 

Suppose  we  are  given  a  random  sample  of  item  parameters 
8  =  8^)  with  prior  distribution  having  unknown  parameter 

$2  »  and  a  random  sample  of  subjects  with  ability  parameter 
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0  = 

(0r. 

. . ,  0  )  with 
n 

prior 

distribution  having  unknown 

parameter 

~1* 

Let 

and  T2 

denote 

the  sufficient 

statistics 

for 

♦l 

and 

<f> 

-2 

respectively. 

These 

statistics  are 

unobserved. 

but 

are 

finite  dimensional  when  the  prior  distributions  belong  to  expo¬ 
nential  families. 

In  order  to  apply  the  EM  algorithm,  we  begin  with  some  initial 
estimate  of  (<i>  . ,  )  ,  then  compute  in  the  E  step, 

(tlft2)  =  E(TirT2!y, ^2)  (2.1) 

and,  for  the  M  step,  maximize  the  likelihood  function,  for  (0,0), 
with  respect  to  (f>^  and  <j>2  ,  with  the  posterior  expectation  (2.1) 
used  in  place  of  (T^,T2) 

However,  for  all  of  the  latent  trait  models  we  have  considered, 
the  evaluation  of  (2.1)  requires  the  numerical  evaluation  of 
multiple  integrals  of  the  order  exceeding  n  and  k  .  The  reason 
for  this  is  that  the  marginal  posterior  of  0^  and  0  ^  must  be 
obtained  through  the  likelihood  function  (1.2)  which  does  not 
factor  into  a  form  suitable  for  low  order  integration. 

We  note  however  that  it  is  considerably  easier  to  compute  the 
posterior  expectation  of  when  we  are  given  2  ,  and  the 

posterior  expectation  of  T2  given  0  .  We  have  thus  modified 
the  EM  algorithm  as  follows. 

Start  with  some  initial  value  (0  ^  ,<J>|  ®  )  for 

(0,<t>  ,4>2)  •  an<3  repeat  the  following  for  v  =  0,  1,  ...,  until  a 
convergence  criterion  is  satisfied. 
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E^  Step:  Compute 


e(v+1)  =  E(0|?»*iv,»elv,) 

tf  +  1)  =  B(T1!Y.t«V)fB(V)) 


(v)  a  (v) 


(2.2  ) 


(2.3) 


Step:  Compute 


0(V  +  1)  =  E(0|Y,4)-(V)  ,6  (v+1)) 


t^V  +  1)  =  E(T2  |Y,^V)  ,0  (V  +  1) 


(2.5) 


Step:  Compute  4>jV  +  "^  ,  the  maximum  likelihood  estimator  of 

<}>1  using  t|v  +  i)  in  place  of  . 

Step:  Compute  ^2V+^  ’  maximum  likelihood  estimator  of 

,  .  (v+1)  .  _  £ 

$2  using  m  place  of  T^« 


If  convergence  is  attained  the  terminal  value  of  (0^,8^' 


Cv)  ^(v) 
1  '  2 


)  will  satisfy  the  consistency  conditions 


=  E(Tl^l'?) 


(2.6) 


e(t2!y,^2,0)  =  e(t2U2,o: 


(2.7) 


Note  that  equation  (2.3)  is  similar  to  the  E  Step  of  the  MLF 
procedure  for  the  mixed  model,  with  the  exception  that  we  condition 
on  the  posterior  expectation  of  8  rather  than  on  the  maximum 
likelihood  estimate. 
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The  estimates  (<f>  ^  f  ^  )  thus  obtained  are  not  true  maximum 

likelihood  estimates,  which  would  result  if  straight  EM  were 
possible.  Because  of  the  conditional  nature  of  this  solution,  and 
because  both  0  and  B  are  random,  we  denote  this  procedure  by 
CMLR. 

The  assumption  that  3  is  a  random  sample  from  some  common 
distribution  could  be  unrealistic  when  item  pools  are  deliberately 
organized  to  contain  a  wide  spectrum  of  difficulties  or  when  other 
differences  are  present.  One  Bayesian  solution  to  this  problem 
is  to  consider  a  uniform  prior  distribution  on  each  where 

the  range  is,  in  principle,  finite  but  very  large.  Using  an 
algorithm  similar  to  CMLR,  the  posterior  distribution  of  B 
(conditional  on  0  )  can  be  computed  and  used  to  compare  different 

items.  This  procedure  will  be  denoted  by  CMLU  and  is  illustrated 
below. 


3.  APPLICATION  OF  EM  ALGORITHM  TO  RASCH  MODEL 

Given  0^  and  3j  ,  the  Rasch  model,  or  one  parameter  logistic 
model,  gives  the  probability  distribution  of  ^  as 


P(Y. .=y . . 
ID  *13 


exp (yj j ) 

1  +  exp ( 6^-3 j ) 


1  . 


In  the  Rasch  model,  Q ^  is  called  the  ability  parameter  and  p .. 
is  called  the  item  or  difficulty  parameter.  Assuming  conditional 
independence  among  the  responses  Y  =  ( (Y^ . ) )  ,  the  probability 

distribution  of  Y  can  be  written  as 


8 


n  k  exp (y  .  ( 6 . -0 . ) ) 

p(Y=y|0,(3)  =  H  IT  - 3J - - - 3 -  (3  -  i) 

i=l  j=l  1  +  exp(0^-gj) 

n  k 

exp (  Z  r. 0 . -  Z  q .8  ) 

=  i=l  1  1  j=l  3  3 
n  k 

n  n  ( i+exp (0.-3.)) 
i=l  j=l  1  3 

t  h 

where  is  the  raw  score  of  the  i  examinee  defined  by 


and 


is  the  item  score  for  the  j  item  defined  by 


n 

=  Z 
i=l 


1 D 


3.1.  MLF  Estimation 

For  MLF ,  we  assume  that  0^,  .  ..,  0n  form  a  random  sample 

of  size  n  from  the  normal  distribution  with  mean  y  and  variance 
2  2 

a  ,  where  y  and  a  are  fixed  but  unknown  quantities.  The 

difficulty  parameters  3  =  (3^,.^/B^)  are  also  assumed  to  be 

fixed  but  unknown.  Since  0^,  ...,  6^  are  assumed  independent, 

the  prior  distirbution  p ( 0 ] p  r  a )  of  0  =  )  is  the  pro” 

duct  of  n  normal  distributions,  each  with  mean  y  and  variance 
2 

a  .  From  (3.1),  the  likelihood  function  of  0  ,  given  3  ,  is 


M0|y,P)  =  P(Y=y|6,P)  . 


Combining  the  prior  distribution  of  r-  ,  p(0|pfa)  ,  with  the 
likelihood  function  of  0  ,  M0jy,3)  ,  we  can  obtain  the  posterior 

distribution  of  0  ,  given  3  ,  which  is 

p(0|y,y,a,£)  =Hp(Gjy,a)J.  (0|y,B)  (3.2) 

where  H  is  the  normalizing  constant  chosen  such  that  the 
expression  on  the  right  side  of  (3.2)  integrates  to  one.  By  in¬ 
tegrating  (3.2)  with  respect  to  0^,  0^_^,  ®i+l'  ***'  ' 

we  can  obtain  the  marginal  posterior  distribution  of  0^  ,  which 
can  be  written  as 

H . exp ( (- (9 . -y) 2 /2a2) +r, 6 ) 

P(0i|y,u,0r3)  =  - e - e-^T - “ 

n  (l+e  1  3) 

j=i 

where  is  the  appropriate  nomralizing  constant* 

The  estimation  of  ability  and  difficulty  parameters  proceeds 
as  follows.  Begin  with  an  initial  set  of  estimates,  6^^  =  (3^^ 
.  3^^)*  for  the  item  parameters,  and  initial  estimates 

and  o  for  y  and  a  respectively.  A  convenient  choice  for 

initial  estimates  of  the  difficulty  parameters  is  the  negative  of 
the  standardized  item  scores.  Then  for  v  =  0,  1,...  ,  until  a 
convergence  criterion  is  satisfied,  repeat  the  E  and  M  steps. 


E  Step:  Calculate 


10 


where 


and 


t12  =  "  e.'f1’ 
12  i=i  12 


0li+1>  '  E(ei|v,p(v),o(u,,e<v», 


e !,+11 

i2 


-  E(62|Y,p<V),o(',|,8<u: 


(3.4) 


(3.5) 


(3.6) 


M  Step:  Find  the  values  of  y  (v+l)  ^  a^V+1^  and  °Vv+1)  which 
maximize 


E  ( log  p(9  |  Y,y 


(V+1)  ,o  (v  +  1>,3 


(v  +  !))[ (v)  „(v)  c(v) 


3 


') 


(3.7] 


In  order  to  assure  uniqueness  of  the  parameterization,  after  each 
M-step,  we  standardize  the  difficulty  parameters  so  that  they  sum 
to  zero. 

2  2 

Since  exp(- (0^-p)/2a  )  is  in  the  integrand  in  (3.5)  and 

(3.6) ,  a  simple  change  of  variable  will  put  this  into  a  form  where 
Gauss-Hermite  quadrature  formulas  for  numerical  integration  are 
suitable.  To  obtain  the  values  of  p  +  ^  and  a  +  ^  which 
maximize  (3.7) ,  we  differentiate  (3.7)  with  respect  to  p  +  ^ 

and  o  +  ^  and  set  these  results  equal  to  zero.  The  integral  in 

(3.7)  can  be  written  as  the  sum  of  a  finite  number  of  single 
integrals,  each  of  which  is  uniformly  convergent  in  p^V  +  ^  and 
o(V+l)  ,  hence  moving  the  differentiation  operator  inside  the 
integral  is  valid.  This  yields  simple  and  familiar  expressions 
for  the  y  +  ^  and  o  +  ^  which  maximize  (3.7),  namely, 


P 


(v  +  1) 


tll/n 


(3.8) 


11 


and 


(v+1)2  t  ,  (v+1)2 

a  =  t12/n  -  v 


(3.9) 


To  find  the  3  that  maximizes  (3.7),  we  differentiate  (3.7)  with 
respect  to  +  ^  '  3  =  and  set  these  results  equal  to 

zero.  Again,  it  is  valid  to  differentiate  inside  the  integral, 
but  now  we  cannot  get  a  closed  form  expression  for  • 

Instead,  we  get  k  nonlinear  equations 


(v  +  1) 


n  a  oo  exp  (0.-3.  ) 

“S-i  +  I  J  - - - J  7vTTT~  P(6ily^>°>g  V  )d9  ■  =  0  , 

3  i=i  1  +  exp(0.-8r  h  ~  1 

1  :  (3.10) 

j  =  1,...,  k.  These  equations  can  be  solved  one  at  a  time  by  the 

secant  method  described  in  Conte  and  deBoor  (1972) . 


3.2  CMLF  Applied  to  Rasch  Model 
The  MLF  procedure  can  be  modified  slightly  by  doing  the  fol¬ 
lowing.  As  before,  begin  with  initial  estimates  y^,o^  and 
S(0)  for  y,  a  and  B.  Then,  until  a  convergence  criterion  is 
satisfied,  for  v  =  0,  1,  .  ..,  repeat  the  following  steps: 

E  Step:  Calculate  t^  =  (t^jt^)  as  in  (3.3)  and  (3.4) 

Step:  Using  0  +  ^  as  the  actual  values  of  0  ,  calculate 

the  maximum  likelihood  estimate  of  3  . 

2 

M2  Step:  Set  y  and  a  +  ^  equal  to  the  values  given  in 

(3.8)  and  (3.9)  respectively. 

After  each  M2  step,  we  standardize  the  item  parameters  so  that  they 
sum  to  zero.  To  do  the  M  step,  we  find  the  log-likelihood  function 


of  3  given  y  and  0 


[v  +  1) 


to  be 


L(3|y,e (v+1) ) 


i  e.(V  +  1)r.  -  l  B.q. 
i=l  1  1  j=l 


I  I  log (1+exp (0|V+1)-3.) 
i=l  j=l  1  3 


(3.11) 


Differentiating  (3.11)  with  respect  to  3^  ,  and  setting  the  result 
equal  to  zero  yields  a  nonlinear  equation  whose  root  is  the  maximum 
likelihood  estimate  of  3  ^  ,  when  0  is  given.  That  is,  we  numer¬ 
ically  solve  the  equation 


dL 


n 


=  -q 


j  +  I  (l+exp(3j-e|V+1)) 


=  0 


for  3..  If  q.  is  not  zero  or  k  ,  then  this  equation  will  have 
J  J 

a  unique  solution. 


3.3  CMLR  Applied  to  Rasch  Model 

Suppose  now  that  0-^,  .  ..,  0^  is  a  random  sample  from  the 

2 

normal  distribution  with  mean  y  and  variance  c  ,  and  3-^,...# 
3^  is  a  random  sample  from  the  normal  distribution  with  mean  zero 
and  variance  x^.  Again,  we  start  with  initial  estimates  3^^/ 
M(°),  and  x^  for  B,u,a  and  t  respectively.  For 

v=0,  1,  ...  ,  until  a  convergence  criterion  is  reached,  we  repeat 

the  following  steps: 

E1  Step:  Calculate  t^  =  aS  in  ^3#3^  and 

=  (t21,t22)  by 


Step:  Calculate  t ^ 


1 
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where 


.<"  +  1>  -  ««  |Y.,<V>  ,9  <V+11  , 


(3.12) 


and 


Bir1}  -  e(b2 ]y,t (v)  ,6<V+1>)  . 


j  2 


(3.13) 


M  Step: 


-  .  (v+1)  _  (v+1) 

Set  y  and  a 


equal  to  the  values  given  in 
(3.8)  and  (3.9)  respectively,  and  set 

x(V+1)  =  t  /k  -  (t  -k)2 
22 


After  each  M  step,  we  standardize  the  item  scores  so  that  they  sum 
to  zero.  Since  8^,  .  ..,  8^  are  independent  and  normally  distri¬ 

buted,  the  joint  distribution  is  the  product  of  k  normal  distri- 

2 

butions  each  with  mean  zero  and  variance  t  . 

Combining  the  likelihood  function  of  8  with  the  prior 
distribution  of  8  we  obtain  the  posterior  distribution  of  8. 
Integrating  with  respect  to  8^,...,  ^j+l'**#'  ^k  ' 

the  marginal  posterior  distribution  of  8 j  given  0  , 

G.  exp(-B2/2x2-B  q  ) 

P(Bj|y,T,e)  =  - 3 - ■*-!_  (3.14) 

n  ( 1+exp ( 9 . —  B  . ) ) 
i=l  1  3 

where  is  the  appropriate  normalizing  constant.  For  evaluating 

posterior  moments,  here  again  Gauss-Kermite  quadrature  formulas  are 
applicable. 


CMLU  is  a  limiting  case  of  CMLR  where  the  prior  distribution 
of  the  item  parameters  is  taken  to  be  uniform.  When  the  B's  are 
independent  and  have  a  uniform  prior,  the  posterior  distribution 


of  gj 


can  be  written  as 


P (3 j |y#x,6) 


F j  exp ( -3^q j ) 
n 

II  (l+exp  (0  .  -g  . )  ) 
i=l  1  3 
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(3.15) 


where  Fj  is  the  appropriate  normalizing  constant.  If  is  not 

zero  or  k,  then  can  be  chosen  to  make  this  integrate  to  one, 

and  also,  moments  of  all  order  exist  for  g .  .  The  estimation 

J 

procedure  is  similar  to  that  of  CMLR  except  that,  first,  the  pos¬ 
terior  distribution  of  g^  is  taken  to  be  the  expression  given 
in  (3.15),  and  second,  the  estimate  for  T^v+1^  need  not  be 
computed . 


4.  NUMERICAL  EXAMPLES 


In  this  section  we  discuss  the  implementation  of  these 
procedures  to  four  simulated  data  sets.  In  all  four  sets,  the  item 
parameters  were  taken  to  be  standard  normal  random  variates.  In 
two  of  the  data  sets,  denoted  SI  and  SII,  the  ability  parameters 
were  taken  as  standard  normal  random  variates.  In  the  third  data 
set,  denoted  SIII,  the  ability  parameters  were  taken  as  a  random 
sample  from  the  uniform  distribution  on  the  interval  from  -3  to  3. 
The  ability  parameters  for  the  fourth  simulated  data  set,  were 
taken  as  random  variates  from  the  Cauchy  distribution,  which  has 
probability  density  function 


f(x)  =  - - -  ,  -oo  <  x  <  oo. 

TT  (1+lOOx  ) 


r>  T1 
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In  all  four  cases,  the  size  of  the  data  sets  were  100  examinees 
and  45  items. 

We  estimated  the  ability  and  difficulty  parameters  by  the 
five  methods:  maximum  likelihood  (ML),  MLF,  CMLF,  CMLR,  and  CMLU. 

In  the  data  set  SI,  one  raw  score  was  k  (45)  and  in  data  set 
SIV,  one  raw  score  was  zero.  In  these  cases  the  maximum  likelihood 
estimate  for  the  ability  of  the  subject  scoring  perfectly  or  scoring 
a  zero,  does  not  exist.  Thus,  we  did  not  apply  ML  in  these  two 
cases . 

The  estimated  parameters  y , o  and  x  for  each  of  the  four 
data  sets  are  shown  in  Table  1.  In  some  models,  the  three  para¬ 
meters  y,  a  and  x  do  not  all  appear.  When  this  happens,  we 
have  given  the  values  of  the  appropriate  sample  statistic  and  put 
these  numbers  in  parentheses.  The  sample  statistics  of  the  actual 
ability  and  item  parameters  are  also  given. 

In  most  cases  the  estimates  for  y  and  o  obtained  by  the 
MLF  and  CMLF  methods  were  quite  close  to  each  other  and  quite 
close  to  the  estimates  obtained  by  the  CMLU  methods.  The  ML  esti¬ 
mates  were  somewhat  close  to  the  MLF,  CMLF  and  CMLU  estimates. 

The  CMLF  estimates  were  usually  quite  far  from  the  estimates 
obtained  by  the  other  methods.  In  one  extreme  case,  x  ,  in  the 
CMLR  method  actually  converged  to  zero,  meaning  that  the  estimates 
of  all  item  parameters  were  zero.  Still  estimates  for  y,  o  and 
G  were  obtained  in  this  case. 

Figures  1  through  4  give  scatter  plots  of  the  ML  estimates 
of  0  for  SII  on  the  vertical  axes,  and  MLF,  CMLF,  CMLR  and  CMLU 
estimates  on  the  horizontal  axes.  Figures  5  through  8  give 
scatter  plots  for  the  corresponding  item  parameters. 
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The  plots  in  Figures  1  through  4  show  the  relation  between 
the  sets  of  ability  estimates.  The  estimates  obtained  by  ML  were 
more  spread  out  than  the  estimates  from  the  other  four  methods. 
Expecially  noticeable  is  the  way  in  which  the  MLF ,  CMLF,  CMLR 
and  CMLU  pulled  the  estimates  at  the  extreme  ends  closer  to  zero. 

The  plots  in  Figure  5,  6,  and  8  show  a  nearly  linear  relation¬ 
ship  between  the  ML  estimates  and  the  MLF ,  CMLF  and  CMLU  estimates 
of  the  item  parameters  that  lies  on  the  diagonal  line  through  the 
origin.  The  plot  of  ML  versus  CMLR  in  Figure  7  shows  a  nearly 
linear  relationship,  except  here  the  CMLR  estimates  are  much  more 
spread  out  than  the  ML  estimates.  The  estimate  for  t  in  SI I  was 
2.6401  which  accounts  for  the  large  variation  in  the  CMLR  estimates. 

Since  the  data  was  simulated,  the  actual  values  of  0  and  ft 

were  known,  so  these  can  be  compared  with  the  estimates.  Table  2 

shows  the  mean  squared  errors  (MSE's)  for  the  different  estimation 
techniques.  An  asterisk  next  to  a  value  indicates  that  the  MSE  for 

that  method  was  smallest  among  the  five  methods.  In  most  cases  the 

MSE's  from  the  MLF,  CMLF  and  CMLU  were  very  close.  In  five  of  the 
eight  cases,  the  MSE  from  the  CMLF  was  the  lowest  among  the  five 
methods.  The  MSE's  for  CMLR  in  Table  2  are  generally  larger  than 
for  other  methods.  This  may  be  due  to  the  poorer  estimates  of  y  , 
a  and  t  as  seen  in  Table  1. 

5.  SUMMARY  AND  FURTHER  REMARKS 

We  have  discussed  several  methods  for  estimating  parameters 
in  the  Rasch  model,  namely,  MLF,  CMLF,  CMLR,  and  CMLU,  In  all  four 
of  these  methods,  the  ability  parameter  of  a  subject  can  be 
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estimated  even  when  that  subject  scores  perfectly  or  scores  a  zero, 
a  property  not  shared  by  maximum  likelihood.  If  an  item  score  for 
some  item  is  either  zero  or  n  ,  then  the  difficulty  parameter  for 
this  item  cannot  be  estimated  by  the  MLF,  CMLF,  or  CMLU  procedures. 
However  this  parameter  can  be  estimated  if  the  CMLR  procedure  is 
used. 

Since  the  item  parameters  are  estimated  one  at  a  time  (in 
all  four  methods  discussed  here) ,  it  is  feasible  that  these  methods 
could  be  extended  to  a  two  or  three  parameter  logistic  model.  In 
extending  the  CMLR  or  CMLU  procedure,  it  is  necessary  to  calculate 
double  integrals  for  the  two  parameter  model  and  triple  integrals 
for  the  three  parameter  model ,  for  each  item  in  the  test,  each 
time  through  the  iteration.  It  might  be  practical  to  compute  double 
integrals,  however  the  computer  time  necessary  to  do  triple  integrals 
would  probably  be  prohibitive.  On  the  other  hand,  when  extending  the 
MLF  or  CMLF  procedures,  it  is  necessary  to  maximize  functions  of 
two  or  three  variables.  The  Newton-Raphson  technique  is  a  practical 
way  to  do  this  even  for  a  three  parameter  logistic  model. 


Table  1.  Estimates  of  Parameters  of  Prior  Distribution 


ACTUAL 

(-0.1177) 

(1.0388) 

(1.0245) 

ML 

NA 

NA 

NA 

MLF 

-0.1452 

1.0460 

(1.0040) 

CMLF 

-0.1447 

1.0407 

(0.9879) 

CMLR 

-0.1299 

0.9092 

0.4926 

CMLU 

-0.1451 

1.0442 

(0.9991) 

ACTUAL 

(-0.0357) 

(0.9758) 

(1.0496) 

ML 

(-0.0953) 

(1.0735) 

(1.1430) 

MLF 

-0.0916 

0.9764 

(1.1142) 

CMLF 

-0.2894 

0.5046 

(1.0949) 

CMLR 

-0.4762 

0.9595 

2.6401 

CMLU 

-0.2904 

0.5070 

(1.1080) 

ACTUAL 

(-0.1878) 

(1.8103) 

(0.9381) 

ML 

(-0.1704) 

(1.8765) 

(0.9352) 

MLF 

-0.1711 

1.8040 

0.9071 

CMLF 

-0.1710 

1.8035 

(0.9059) 

CMLR 

-0.1517 

1.5743 

0. 

CMLU 

-0.1712 

1.8055 

(0.9103) 

ACTUAL 

(-0.3759) 

(1.1151) 

(0.8796) 

ML 

NA 

NA 

NA 

MLF 

-0.2904 

0.5075 

(0.9252) 

CMLF 

-0.2894 

0.5046 

(0.9110) 

CMLR 

-0.4762 

0.9595 

2.6401 

CMLU 

-0.2904 

0.5070 

(0.9232) 

method  not  applicable  in  this  case 


TABLE  2 


MSE’s  of  Ability  and  Item  Parameters 


SI 


SII 


SIII 


,  100  ^  9 

.  45 

45.1/Sj-S 

ML 

NA 

NA 

MLF 

.11486 

.04059* 

CMLF 

.11473* 

. 04083 

CMLR 

.12955 

.30639 

CMLU 

.11481 

.04069 

ML 

.13247 

.  06626 

MLF 

.10541 

. 06049 

CMLF 

. 10540* 

.05749* 

CMLR 

. 12145 

.21690 

CMLU 

.10542 

.05982 

ML 

.20119 

.07112 

MLF 

.16138 

.07005 

CMLF 

.16123 

.06992* 

CMLR 

.21943 

.86055 

CMLU 

.16119* 

.07002 

ML 

NA 

NA 

MLF 

.31587* 

.06347 

CMLF 

.72997 

.06142* 

CMLR 

. 54443 

2.94153 

CMLU 

.72788 

.06303 

NA  -  method  not  applicable  in  this  case. 

*  -  method  had  lowest  MSE  among  five  methods. 


.80  -0.00  0.00  0.80 


Estimates  of  Difficulty  Parameters 


.  23 


CMLR 


FIGURE  7 


ML  vs  CMLR  Estimates  of  Difficulty  Parameters 
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